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Abstract 

To understand the epigenetic regulation required for germ cell-specific gene expression in the mouse, we 
analysed DNA methylation profiles of developing germ cells using a microarray-based assay adapted for a 
small number of cells. The analysis revealed differentially methylated sites between cell types tested. Here, 
we focused on a group of genomic sequences hypomethylated specifically in germline cells as candidate 
regions involved in the epigenetic regulation of germline gene expression. These hypomethylated sequences 
tend to be clustered, forminglarge (10 kb to ~ 9 Mb) genomic domains, particularly on the X chromosome of 
male germ cells. Most of these regions, designated here as large hypomethylated domains (LoDs), correspond 
to segmentally duplicated regions that contain gene families showing germ cell- or testis-specific expression, 
including cancer testis antigen genes. We found an inverse correlation between DNA methylation level and 
expression of genes in these domains. Most LoDs appear to be enriched with H3 lysine 9 dimethylation, 
usually regarded as a repressive histone modification, although some LoD genes can be expressed in male 
germ cells. It thus appears that such a unique epigenomic state associated with the LoDs may constitute a 
basis for the specific expression of genes contained in these genomic domains. 
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1 . Introduction 

Among the many cells that constitute an organism, 
only the germ cell can transmit its genetic information 
to the next generation. To achieve this vital function, 
germ cells must possess a distinctive gene transcription 
programme and epigenomic features unique to this cell 
lineage. It is accepted that, during development, germ 
cells undergo marked changes in their global epigenetic 
status, termed 'epigenetic reprogramming'. 1,2 However, 
details of the epigenetic changes and their relationship 
to the establishment of germ cell-specific gene expres- 
sion have been poorly characterized, possibly for tech- 
nical reasons. The number of developing primordial 



germ cells (PGCs) in embryos is limited, 3 restricting the 
conventional analysis of epigenetic status. Most of the 
genome-scale studies on DNA methylation have 
focused on CpG islands (CGIs) or gene promoters (e.g. 
Borgel etfl/. 4 ). Currently, the importance of DNA methy- 
lation of other parts of genes, non-genic or intergenic 
regions for establishing gene expression and the 
nuclear organization of chromatin is increasingly recog- 
nized. 5-7 Although recent research has advanced our 
understanding of the PGC epigenome, 8-11 further 
studies are still required to gain more detailed informa- 
tion on epigenomic features of germline cells and 
their involvement in defining germ cell-specific gene 
expression. 
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In this study, we used a proven method of DNA methy- 
lation analysis called the HELP assay {Hpa\ I tiny fragment 
enrichment by ligation-mediated polymerase chain 
reaction (PCR)). 12-14 Because this method uses linker- 
mediated PCR, it can be adapted for the small-scale 
analysis of developing germ cells. Oda etal. 1 4 developed 
an improved version of the method, nanoHELP. Here, we 
have fine-tuned the protocol further. This modified 
nanoHELP method provides a global analysis of the 
DNA methylation status of CCGG sites using only a sub- 
nanogram (>0.5 ng) quantity of genomic DNA. The 
custom-made genomic microarray used in this study is 
unusual in that the CCGG sites of the intergenic 
regions as well as the promoters and gene bodies of the 
RefSeq genes could be tested. This HELP microarray 
may provide new information about previously unex- 
plored parts of the germ cell epigenome. We applied 
this method to analyse DNA methylation in the mouse 
X chromosome. We reasoned that epigenomic features 
specific to germ cells can be found by focusing on the X 
chromosome, because the X chromosome carries 
many germ cell-expressed genes 15,16 and undergoes 
major epigenetic changes (e.g. X chromosome reactiva- 
tion) during germ cell development. 1 7 In this analysis, 
we found for the first time a group of sequences that 
are specifically hypomethylated on the X chromosome 
of male germline cells. These sequences form relatively 
large genomic domains that harbour gene families dis- 
playing specific expression in germ cells. We term these 
regions as large hypomethylated domains (LoDs). LoDs 
have not been detected in previous studies, including 
recent whole-genome bisulphite sequence analyses, 
probably because mapping of bisulphite-converted 
short sequence reads onto locally duplicated regions 
such as LoDs is technically challenging. In contrast, the 
experimental design of the HELP assay, which involved 
removal of the potentially confounding effect of copy 
number difference 12 and inclusion of a probe design 
that selects unique sequences for hybridization, was 
effective in finding LoDs. Interestingly, many genes with 
homology to human cancer testis antigen (CTA) genes 
are contained within the LoDs. CTA genes are normally 
expressed only in the germline, and are also expressed 
in some tumour cell types. 18 The results presented in 
this study may shed light on the epigenetic basis for the 
germline gene expression programme and its relation- 
ship with oncogenesis. 

2. Materials and methods 

2. 1 . Sample preparation and purification of DNA and 
RNA 

TMA5 cells are male embryonic stem (ES) cells 
derived from the 1 29/Sv mouse. 19 The female ES#5 
line was from F1 hybrid mice between 



TgN(deGFP)20 Imeg (RBRC No. 00822) and MSM/Ms 
(RBRC No. 00209). The embryonic germ (EG) cell 
lines used in this study were TMA55G (male) and 
TMA58G (female). 19 These ES and EG cell lines were 
cultured, as described previously. 20 The Oct3 /4-GFP 
transgenic mouse line TgN(deGFP)1 8 lmeg (RBRC No. 
00821 ) 21 was used to collect PGCs from developing 
mouse embryos, as described previously. 20 Germline 
stem (GS) cells were obtained from the RIKEN Cell 
Bank (RCB1 968) and were cultured on a feeder layer 
asdescribed. 22 Germ cells expressingthe Venus reporter 
were purified by fluorescence-activated cell sorting 
(FACS) from adult testis cells of the Mvh-Venus bacterial 
artificial chromosome transgenic mouse line, Tg(Mvh- 
Venus)1 Rbrc (Mise and Abe, unpublished results). 23 All 
animal experiments were approved by the Institu- 
tional Animal Experiment Committee of the RIKEN 
BioResource Center. DNAand RNA we re extracted simul- 
taneously from the same samples using an AllPrep DNA/ 
RNAMicro Kit (Qiagen, Hilden, Germany). Thequality of 
RNA samples was checked using an Agilent 2100 
Bioanalyzer (Agilent Technologies,Santa Clara, CA, USA). 

2.2. Gene expression profiling 

A 44 K custom microarray 24 was used for gene expres- 
sion profiling throughout this study. This custom array 
covers all the known protein-coding genes as well as ex- 
pression sequence tags derived from PGC cDNA libraries 
(Abe, unpublished) and was manufactured by Agilent 
Technologies. Total RNA was labelled with Cy3-CTP 
with a Quick Amp Labeling Kit (Agilent Technologies). 
Hybridization was performed according to the proto- 
col suggested by the supplier. Hybridized slides were 
scanned using a microarray scanner (Agilent 
Technologies), and the signals were processed with the 
Feature Extraction software ver. 10.5.1.1 (Agilent 
Technologies). The processed signal data were normal- 
ized and analysed by the Gene Spring GX1 1 .5 software 
(Agilent Technologies). The microarray experiments 
were conducted using biologically duplicated samples. 

2.3. Modified nanoHELP: linker-mediated amplification 
and hybridization 

The nanoHELP assay, a microarray-based DNA methy- 
lation analysis, was performed according to our previous 
reports 12,14 with modifications. Briefly, genomic DNA 
(0.5-2 ng) was digested by HpaW or Msp\ in 1 00 |xl of re- 
action mixture at 37°C overnight. This was followed by 
DNA purification with the MinElute Reaction Cleanup 
Kit (Qiagen), and the digested DNA was ligated to 
linker adapters, NHpalU 2/Nhpall24 and JHpalll 2/ 
Jhpall24 1 4 overnight at 1 6°C After removing the linker 
adapters, the ligated DNA was added to a total of 50-|xl 
PCR reaction mixture containing 1 .5 |xl each of 2 0 |Jvl 
primer (NHpall24, 5'-GCAACTGTGCTATCCGAGGGAA 
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GC-3'; JHpall24, 5'-CGACGTCGACTATCCATGAACAGC- 
3'), 1 0 julI of 5 M betaine (Sigma-Aldrich), 2 00 |xM of 
dNTPs and 2.5 units of ExTaq DNA polymerase (TaKaRa 
Bio, Inc., Otsu, Japan) in a buffer supplied by the manu- 
facturer. The mixture was heated at 72°C for 1 0 min 
and subjected to PCR amplification with the following 
parameters: 1 5 cycles at 95°C for 30 s and 72°C for 
3 min, with a final extension at 72°C for 1 0 min. After 
the first round of amplification, one-tenth of the 
volumeofthe reaction was added to afresh PCR reaction 
mix containing the same primers, and amplified for an 
additional 10-15 cycles 14,25 with the same PCR para- 
meters as described above. The PCR products were puri- 
fied using the MinElute Kit (Qiagen). An additional 
column washing step with 750 |J of 35% guanidine 
hydrochloride (Nacalai Tesque, Inc., Kyoto, Japan) solu- 
tion was performed to remove the residual primer- 
adapters. The amplified DNA originally digested with 
HpaW was labelled with Cy5-labelled Random 9-mers 
(TriLink Biotechnologies, San Diego, CA, USA), and the 
Msp I -digested DNA was labelled with Cy3-Random 9- 
mers (TriLink Biotechnologies). The labelled DNAs 
were mixed and hybridized with a custom microarray 
(Roche NimbleGen, Madison, Wl, USA) using a 
NimbleGen Array Hybridization Kit (http://www. 
nimblegen.com/products/lit/lit.html). After washing 
with the NimbleGen Array wash kit, the microarrays 
were scanned on an Agilent Technologies Scanner 
G2 505C with a setting of 5-jjim resolution. The HELP 
array experiments were performed using biological repli- 
cates. The raw data were processed using the 
NimbleScan 2.4 data extraction software (NimbleGen) 
to obtain the processed log 2 (Cy5/Cy3) ratio data. 

Our previously published ChlP-on-chip data for 
cumulus cells 26 were reanalysed and used. The ChlP- 
on-chip experiment using GS cells was performed as 
described in the same paper. 

2.4. Microarray design 

The microarrays were designed to represent restric- 
tion fragments with 5'-CCGG restriction sites in a size 
range of 200-2000 bp (=CCGG segments) on 
mouse Chromosome 7 and the X chromosome. Ten 
50-mer oligonucleotide probes were designed from 
unique sequences in each CCGG segment, avoiding 
repeat-masked regions and sequence ambiguities. 
Probe sequences were selected using a score-based se- 
lection algorithm, as described. 1 2 Detailed information 
for the coverage of genomic regions on each chromo- 
some and annotations of the CCGG segments are 
described in Supplementary Table S1. Information 
about the positions of probes, M-values obtained from 
different samples and k-means cluster number are 
described in Supplementary Table S2, and gff files 
of the HELP array data are available at our web site 



(http://www.brc.riken.go.jp/lab/mcd/mcd2/protocol/ 
nanoHELP.html). 

2.5. Data analysis 

The steps in the HELP data analysis are shown sche- 
matically in Supplementary Fig. S1. Briefly, hybridiza- 
tion signal noise is first removed from the processed 
data bycuttingoff the values in the range of random se- 
quence probes. In our microarray, 1 0 oligonucleotide 
probes are normally assigned to each CCGG segment. 
The median signal intensity of the 1 0 probes is calcu- 
lated and used to define the segment's signal intensity. 
Using the median signal values, the Hpa\\/Msp\ ratio is 
then calculated for each CCGG segment and converted 
to a log 2 value to obtain the /Vl-value. After normaliza- 
tion of the microarray ratio data, hypomethylated and 
hypermethylated segments are distinguished using an 
R script (http://www.r-project.org/) that determines 
the threshold values based on a binarization 
method. 27 The marginal widthofthethreshold is calcu- 
lated using the Mahalanobis distance. 28 The log 2 value 
at the threshold is set as 0, so that unmethylated seg- 
ments have a value of >0 and methylated segments 
have a negative value (<0). For interarray normaliza- 
tion of the Hpa\\/Msp\ ratio, the threshold value of 
each array data set is scaled to 0. All the bioinformatic 
analysis described here is based on the UCSC mm8 
genome assembly (http://genome.ucsc.edu/). 

2.6. Accession number 

Gene expression microarray data and the HELP array 
data are available at the Gene Expression Omnibus data- 
base (http://www.ncbi.nlm.nih.gov/geo/) (Accession 
numberGSE39895). 

3. Results 

3.1 . DNA methylation analysis with subnanogram 
amounts of genomic DNA 
For epigenomic analyses of developing cells, materi- 
als can be very limited in quantity, precluding conven- 
tional analytical techniques. One of our goals was to 
describe comprehensively the epigenomic changes 
during the development of early embryos and germ 
cells in the mouse. Towards this goal, we use the HELP 
assay, 12 a proven, microarray- based method, for the 
analysis of DNA methylation. 1 3 The original HELP proto- 
col requires 1 0 |xg of genomic DNA as the starting 
material, 12 but Oda et al.^ 4 established an improved 
version of the method, nanoHELP, for the analysis of a 
limited amount of starting DNA. Here, we have fine- 
tuned the protocol further for the analysis of 0.5- 
2 ng of starting material. The details of this method 
are described in Section 2, and the flow of the data 
analysis is presented in Supplementary Fig. S1 . 
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The HELP assay is a microarray- based method that 
detects the subset of unmethylated HpaW fragments 
in the genome and uses the corresponding, methyla- 
tion-insensitive Msp\ representations as a control. 
The M-value, an index of the methylation level, is cal- 
culated as log 2 (Hpflll signal/Ms/?l signal) as described 
in Section 2: unmethylated segment has a value of 
>0 and methylated segment has a negative value of 
<0. As shown in Supplementary Fig.S2A-D,the modi- 
fied nanoHELP assay generated data having good cor- 
relations with the data obtained by the original 
protocol. To validate these results, bisulphite pyrose- 
quencing analysis of six CpG sites was performed as 
described. 12 The results showed that the modified 
nanoHELP assay could generate reliable data 
(Supplementary Fig. S2E). 

3.2. Custom HELP microarray used in this study 

The number of restriction sites for HpaW, 5'-CCGG, in 
the mouse genome (UCSC mm8) is 1 588 546, cover- 
ing ~7.5% of the total CpG dinucleotides in the 
genome (Supplementary Table S1 A). The CCGG sites 
are distributed almost evenly over the mouse genome 
and do not show an apparent bias to a particular 
genomic context. Thus, the use of CCGG sites is suitable 
for obtaining a chromosome-wide view of CpG methy- 
lation profiles. 

Inthisexperiment,wedesigned acustom microarray 
harbouring 382 018 oligoprobes. We first selected 
Hpa\ I or Msp\ fragments (designated here as CCGG seg- 
ments) with a size range of 200-2000 bp, mostly 
from mouse Chromosomes 7 and X, and designed 1 0 
unique sequence probes of 50 nucleotides per CCGG 
segment. The custom microarray can detect 22 1 28 
CCGG segments on Chromosome 7 and 14 472 on 
the X chromosome, which correspond to 46 and 39% 
of the total segments on each chromosome, respect- 
ively. The CCGG segments were selected in an un- 
biased fashion except for a part of the Chromosome 
7 and a small number of segments associated with 
some germ cell-related genes on Chromosomes 6, 
1 1 and 1 2 (Supplementary Table S1 B). Supplementary 
Table S1C and D describes the categorization of the 
CCGG segments based on the genome annotations. 
About 47% of the segments map to intergenic regions, 
5% to promoter regions and 46% to bodies of RefSeq 
genes (Supplementary Table S1 C). About 75% of the 
X-linked RefSeq genes are covered by this HELP micro- 
array (Supplementary Table S1 D). Because most of 
the CCGG segments in CGIs are <200 bp, ~1% of 
CGIs annotated in the UCSC mm8 genome assembly 
can be assayed bythisarray (Supplementary Table S1 E). 

CGIs and gene promoter regions have been the 
main targets in most DNA methylation studies. 
However, the importance of DNA methylation in 
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genomic regions outside the promoters is becoming in- 
creasingly apparent. 5,6 It is expected that this micro- 
array method should be appropriate for the analysis of 
previously unexplored and potentially informative 
parts of the genome. 



3.3. Analysis of DNA methylation profiles of stem cells 
and qermiine cells 

We performed DNA methylation profiling of the fol- 
lowing samples: ES cells from male and female blasto- 
cysts, male and female EG cells established from PGCs 
at embryonic day 1 2.5 (E1 2.5), GS cells derived from 
spermatogonia 22 and male and female PGCs purified 
from Oct3 /4-GFP transgenic embryos 20,21 in various 
stages by FACS. PGCs were isolated from male and 
female E10.5, E13.5 and E1 7.5 embryos. PGCs have 
not entered the gonads at E1 0.5, and that are colonized 
within the gonads in E1 3.5 embryos. At E1 7.5,PGCsare 
subjected to mitotic arrest in male gonads, and female 
PGCs are arrested in the early phase of meiosis. 3 We 
isolated germ cells from newborn ovary and testis. 
Whole adult testis, thymus and brain were isolated 
from male mice and used for the analysis. Germ cells 
in the adult testis were purified by FACS from Mvh 
(mouse Vasa homolog)-Venus transgenic mouse (Mise 
and Abe, unpublished results). 23 Gene expression pro- 
filing of all samples was conducted using our custom 
44 K microarray. 

Figure 1 shows the results of principal component 
analysis (PCA) and hierarchical cluster analysis of the 
DNA methylation profiles and the gene expression pro- 
files. In this comparison, pluripotent stem cells (i.e. ES 
and EG cells) and PGCs from various stages show 
similar but distinct expression profiles; ES and EG cells 
are positioned more closely (blue circle) relative to 
PGCs (red circle) (Fig. 1 A and C). This result confirms 
our previous findings that PGCs possess a distinct tran- 
scription programme from ES cells, although both 
share the expression of common 'signature genes'. 20 
In contrast, analysis of the DNA methylation profiles 
showed the differences between samples more clearly. 
PGC samples could be classified into two groups: one 
comprising female PGCs and early male PGCs (i.e. 
E1 0.5 and E1 3.5 in the red circle) and E1 7.5 and P0.5 
male germ cells that formed a cluster together with 
GS cells and testis (green circle) (Fig. 1 B and D). Male 
PGCs in different stages appeared to be more distantly 
related to each other than to female PGCs, suggesting 
that the DNA methylation profiles change more drastic- 
ally during male PGC development. These results 
suggest that cell types can be classified by their DNA 
methylation profiles and that, in some cases, DNA 
methylation profiling can display differences in the 
cellular state more effectively. 
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Figure 1. Profiling of gene expression and DNA methylation in germ cells and stem cells. (A) PCA of the expression profiles of germ cells, stem 
cells and adult organs. ES_m (male ES cells), ES_f (female ES cells), EG_m (male EG cells), EG_f (female EG cells), 1 0.5m (PGCs from male 
E1 0.5 embryos), 1 0.5f (female E1 0.5 PGCs), 1 3.5m (male E1 3.5 PGCs), 1 3.5f (female E1 3.5 PGCs), 1 7.5m (male E1 7.5 PGCs), 1 7.5f 
(female E1 7.5 PGCs), P0.5m (spermatogonia from P0.5 neonates), P0.5f (oocytes from P0.5 neonates) and GS cells. Testis, thymus and 
brain were isolated from male adult mice. (B) PCA analysis of DNA methylation profiles of germ cells, stem cells and adult organs. (C) 
Hierarchical clustering of gene expression profiles. (D) Hierarchical clustering of DNA methylation profiles. 



3.4. k-means cluster analysis of DNA methylation 

profiles to visualize cell type-specific differentially 
methylated regions 
To visualize differences in the DNA methylation pro- 
files of the samples tested, non-hierarchical k-means 
analysis(fe = 1 2) was performed usingthedataobtained 
from the 2 8 21 7 informative CCGG segments; the result 
is shown as a heat map in Fig. 2 A. One of the most con- 
spicuous trends was that the genomes of the PGCs exam- 
ined are mostly hypomethylated except for male E1 7.5 
PGCs. A box plot of the M-value for each sample is 
shown in Fig. 2B. Global levels of DNA methylation are 
lower in the E1 0.5 PGC genome than in ES and EG cells, 
and the levels are even lower at E1 3.5. At E1 7.5, the 
methylation level of male PGCs is increased, whereas 
female PGCs maintain a hypomethylated status similar 
to that at E1 0.5 or E1 3.5. The difference in the methyla- 
tion level between male and female germ cells is most 
prominent in neonates: male spermatogonia have a 
highly methylated genome. GS cells possess a similar 
DNA methylation profile to that of P0.5 spermatogonia 
(R = 0.80). Epigenetic features of GS cells have not 
been reported to date, and this result suggests that GS 
cells should provide a valuable in vitro model forepigen- 
etic studies of spermatogonia! cells. Adult testis, which 



comprises both germ and somatic cells, has a slightly 
lower DNA methylation level relative to spermatogonia 
(P0.5 male) and GS cells (Fig. 2 B). 

Although most of the PGC genomes are hypomethy- 
lated, genomic regions classified as Cluster 1 2 remain 
methylated at a level similar to that in the other cell 
types examined. The rest of the clusters showed some 
cell or tissue specificities in DNA methylation. 

3.5. Characterization of germline-specific 

hypomethylated CCGG segments on theX 
chromosome 

The clusters were then characterized by examining 
their tissue specificities in DNA methylation patterns. 
We noticed that Cluster 4 comprises the segments 
hypomethylated only in PGCs, GS cells and the testis, 
whereas these segments are hypermethylated in ES 
and EG cells, and somatic organs. These putative germ- 
line-specific hypomethylated segments were charac- 
terized further. Although no particular GO terms are 
enriched, Cluster 4 contains genes expressed in the 
germ cells of testis such as Xmr (X/r-related, meiosis 
regulated) 29,30 or CTA genes; e.g. the Mage (melanoma 
antigen) gene family. 31 The numberof Cluster 4 CCGG 
segments mapped onto the X chromosome is 
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Figure 2. DNA methylation dynamics during germ cell development. (A) k-means clustering of DNA methylation profiles of germ cells, stem 
cells and adult organs. DNA methylation levels of the CCGG segments are represented as a heat map (unmethylated segments in dark blue, 
M-value = 6.00; highly methylated segments in dark red, M-value = -6.00). (B) Changes in global DNA methylation levels during PGC 
development. The M-value was calculated for each sample and is shown as a box plot. The bottom and top of the boxes are the 2 5th and 
7 5th percentile, respectively. 



disproportionately high. There are 1 004 segments in 
Cluster 4, and 71 5 (71 .296) are on the X chromosome, 
which has 1 4 472 CCGG segments in total. In contrast, 
219 of Cluster 4 segments (21.896) are on 
Chromosome 7, which carries 22 1 28 segments. 
Because genes expressed in germ cells or testis are 
known to be enriched on the X chromosome, 1 5,1 6 we 
focused on the Cluster 4 segments of the X chromo- 
some as candidates involved in the epigenetic regula- 
tion of germ cell-specific gene expression. 

Asshown in Fig. 3 and Supplementary Fig. S3A-C, we 
plotted the M-values of all the CCGG segments along 
the X chromosome using the data obtained from each 
sample (grey dots). To translate the M-value measure- 
ments into regions of equal M-value, we used a circular 
binary segmentation programme, which is used nor- 
mally for comparative genomic hybridization ana- 
lysis. 32 Using this programme, we drew lines (black 
horizontal lines) to show regions of equal M-value. By 



tracing the line, we could identify the genomic 
regions in which the M-value changes significantly 
from the flanking regions. The M-values of the seg- 
ments belonging to Cluster 4 are overlaid as red dots. 
The distributions of the M-values in the DNA of 
somatic cells (i.e. brain and thymus) along the entire X 
chromosome are similar to each other: the average 
M-value is less than -1, with some local exceptions. 
The Cluster 4 dots are mapped even below the 
average line, indicating that, as expected, Cluster 4 seg- 
ments are hypermethylated in both brain and thymus 
(Fig. 3). In ES and EG cells (Fig. 3 and Supplementary 
Fig. S3A), the average M-values of the Cluster 4 seg- 
ments do not change significantly along the X chromo- 
some and are positioned below -1, suggesting that 
Cluster 4 segments are largely hypermethylated in 
the genomes of ES or EG cells. In sharp contrast, in 
E1 7.5 male PGC DNA, it appears that the average M- 
value line is often discontinuous, and that 
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Figure 3. Discovery of large, contiguous genomic regions with lowDNAmethylation modifications in male germ cells. The methylation profiles of 
genomic DNA along the mouse X chromosome. Samples used for the analysis are indicated in each figure. The A4-value of each CCGG segment 
obtained from the analysis of each sample DNA is plotted on the mouse X chromosome (grey dots). The y-axis represents the M-value; 
log 2 (Hpall//VlspI). The black line was drawn using DNAcopy, a circular binary segmentation programme obtained from http://www. 
bioconductor.org/packages/ 2.3/ bioc/html/DNAcopy.html. 32 Red dots represent CCGG segments belonging to Cluster 4. Double-headed 
arrows indicate the position of the Xmr gene cluster. 



hypomethylated CCGG segments exist over relatively 
large, contiguous genomic regions (Fig. 3). For 
example, the average A4-value of the segments within 
the ~9 Mb genomic region harbouring the Xmr gene 
cluster (double-headed arrows) is close to 0, and 
Cluster 4 segments are enriched in this region. 

It is obvious that the distribution of the Cluster 4 seg- 
ments is not uniform, and that these Cluster 4 segments 
form 'hypomethylated domains' compared with their 
flanking regions. These trends persist in P0.5 spermato- 
gonia and GS cells derived from spermatogonia (Fig. 3), 
with a few cell type-specific differences. In testis DNA, 
the overall methylation pattern of the Cluster 4 seg- 
ments is essentially similar to that found in the male 
germline cells, described above (Fig. 3). We also exam- 
ined earlier stages of male PGCs (Supplementary Fig. 



S3B). In E10.5 male PGCs, formation of hypomethy- 
lated domains, e.g. Xmr region, is not as obvious as 
seen in E1 7.5 male PGCs. In E1 3.5 male PGC DNA, the 
distribution of the Cluster 4 segments is similar to 
that found in E1 7.5 male PGCs. It thus appears that 
clustering of hypomethylated DNA segments become 
increasingly evident on the X chromosome during the 
development of male germ cells. 

3.6. Discovery of large genomic regions hypomethylated 
specifically in male germline cells 
Although it is clear that segment DNAs possess gener- 
ally lower methylation levels in female PGCs than in 
cells of somatic organs, the formation of hypomethy- 
lated DNA regions as seen in male PGCs is not evident 
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in female PGCs (Supplementary Fig. S3B and C). We 
therefore decided to focus on the male germ line-specif- 
ic hypomethylated DNA regions comprised of Cluster 4 
segments. To visualize the hypomethylated DNA 
regions in the male germline from a different viewpoint, 
we plotted fold differences in the methylation level 
between somatic and male germ cell DNA along the X 
chromosome (Fig. 4A). Because methylation patterns 
of Cluster 4 segments are essentially similar in testis, 
E1 7.5 and P0.5 male PGCs, the testis was chosen for 
this analysis. Brain was also used as somatic tissue for 



this analysis. As the data were plotted with a log 2 
scale, a negative value indicates the lower level of DNA 
methylation in the testis than in the brain. The plot 
revealed broad domains with lower methylation levels 
in the testis and therefore, in male germ cells of late 
stages (coloured light blue in Fig. 4A). These broad 
domains of hypomethylated DNA described above are 
distinct from CGIs, which are generally located within 
or near a promoter and have a typical length of 300- 
3000 bp. 33 The broad and hypomethylated domains 
identified here are often much larger than CGIs and 
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Figure 4. Demonstration of LoDs on the mouse X chromosome. (A) Differences in methylation levels between brain and testis genomic DNA 
along the mouse X chromosome. Fold changes in the methylation level, i.e. brain M-value versus testis A/l-value, were calculated for each 
CCGG segment, and plotted using the log 2 scale along the X chromosome. The blue lines indicate genomic regions showing more than a 
2-fold difference between the brain A/l-value and testis A4-vaIue. Light blue represents hypomethylated regions in the testis relative to 
the brain. Green dots at the bottom represent the positions of CGIs. (B) The positions of segmentally duplicated regions along the mouse 
X chromosome. Segmentally duplicated regions >1000 bases with >98% similarity are counted, and the frequencies of duplications 
(y-axis) are shown. (The data were from UCSC Genome Browser.) Grey bars represent duplications occurring on the X and other 
chromosomes, and yellow bars represent the frequencies of duplications mapped only on the X chromosome. (C) Methylation analysis of 
LoDs 1 0 and 1 2 by Southern blot hybridization. Genomic DNAs of the male thymus, male brain and testis were digested by either 
methylation-sensitive /-/poll (H) or methylation-insensitive isoschizomer, Msp\ (M). The Southern blot was hybridized with a probe 
targeted to LoDs 1 0 and 1 2. A primer pair (FW: 5'-GCTGGGTCCAGCTTCCCTGG-3', RV: 5'-TGGCACCCCTCCTCCCTGAT-3') was used to 
amplify a 807-bp sequence using testis cDNA for generation of the probe. The 807-bp probe contains locally repeated sequences and 
corresponds to both LoDs 1 0 and 1 2 located upstream of Mageb) jb2 genes. (D) Methylation analysis of LoDs 1 0 and 1 2 in purified 
germ cells. Germ cells expressing the Mvh-Venus reporter were purified from adult testis by FACS. DNAs from purified germ cells and 
whole testis were digested with Msp\ plus BamHl (M + B), HpaW plus BamHl (H + B) or BamHl only (B). A Southern blot was made using 
these DNAs and hybridized with the same probe as used in Fig. 4C. 
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do not show preferential localization at promoter 
regions. Thus, these broad domains do not correspond 
to the known hypomethylated regions and may repre- 
sent a hitherto unknown epigenomic entity. For con- 
venience of discussion, we here designate such a 
broad and hypomethylated domain as an LoD. By defin- 
ition, a LoD has a size of > 1 0 kb and shows more than 
a 2-fold difference in M-value between germline and 
somatic cells (testis and brain in this case). Each LoD 
should also have at least one Cluster 4 segment. 

3.7 . Overlap ofLoDs with segmentally duplicated 
regions 

Usingthe definition described above, we listthe LoDs 
of the X chromosome in Table 1 .There are 1 6 LoDson 
the X chromosome (Table 1 ), and their sizes are gener- 
ally large: 1 1 of the 1 6 LoDsare >1 00 kb(mean: 1219 
252 bp), and six of the large LoDs are >1 Mb. The 
mammalian genome is replete with segmentally dupli- 
cated regions. 34 Although segmental duplications can 
be found on every chromosome, they are particularly 
abundant on the sex chromosomes. Because LoDs are 
generally large and contain gene families such as Xmr, 
we asked whether LoDs overlap with segmentally dupli- 
cated regions. As shown in Fig. 4A and B and Table 1 ,all 
LoDs on the X chromosome are found to contain seg- 
mentally duplicated regions. The use of the Msp\ 
control represents an unusual strength of the HELP 
assay to remove the potentially confounding effect of 
copy number variation. 12 Combined with a probe 
design that selects only unique sequences for hybridiza- 
tion, these aspects ensure that the DNA methylation 
readout from regions of constitutive segmental dupli- 
cation accurately reflectsthe underlying DNA methyla- 
tion and is not influenced by DNA copy number. 

Hypomethylation of two such domains, LoD 1 0 and 
1 2, was confirmed by Southern blot analysis (Fig. 4C 
and D). Since LoD 10 and 12 contain homologous, 
locally repeated sequences, a hybridization probe can 
be used to assess the methylation status of both 
regions. The genomic DNAs of the thymus, brain and 
testis were digested by either methylation-sensitive 
HpaW or the methylation-insensitive Msp\. In the HpaW 
digests of thymus and brain DNA, no bands were 
detected except for a hybridization signal in the unre- 
solved part of the lanes, indicating that the genomic 
region is hypermethylated in somatic organs. In the 
HpaW digest of testis DNA, many bands were detected, 
and the band pattern was essentially the same as that 
found in the Msp\ digest, clearly indicating that this 
region is largely unmethylated in the testis. Given that 
the testis comprises both germ and somatic cells, we 
asked whether LoDs are hypomethylated in germ 
cells. We used an Mvh 35 -Venus reporter transgenic 
mouse line, in which germ cells are marked by Venus 



fluorescence protein (Mise and Abe, unpublished 
results). We also used FACS to purify the Venus- positive 
germ cells from the adult testis and performed 
Southern analysis. The results indicated that the 
genomic regions in the purified germ cells are indeed 
hypomethylated (Fig. 4C and D). 

3.8. Predominance of genes expressed in male germ cells 
or in the testis in LoDs 

We noticed that most LoDs contain genes that are 
expressed in the testis. For example, GmclU (germ 
cell-less protein-like 1 -like), Ssx9, Fthll 7, Xmr, Mageb, 
Ott, Samt4 and Magea are expressed in the testis and 
are included in LoDs 1 , 2, 3, 4, 1 1 , 1 4 and 1 5, respect- 
ively (Table 1). Expression of these genes are also 
detected in germ cells purified from adult testes (data 
not shown). If we omit LoDs 1 0, 1 2 and 1 6, which do 
not carry known genes, only LoDs 6 and 1 3 do not 
contain genes predominantly expressed in germ cells 
(Table 1 and Supplementary Table S3). The mean ex- 
pression levels of genes contained in LoDs are shown 
in Fig. 5A. Genes within LoDs show significantly higher 
expression in the testis than in the brain. Figure 5B 
shows the mean levels of DNA methylation within and 
outside LoDs on the mouse X chromosome. Figure 5 
indicates that there is an inverse correlation between 
the level of DNA methylation and the expression of 
genes in LoDs. 

3.9. Genomic structures of LoDs: Xmr/Slx and Mageb 
regions 

In addition, we demonstrate the detailed structures 
of two LoD regions (Fig. 6). LoD 4 is ~9.1 Mb in size 
(chrX: 22 991 291-32 1 1 7 922) and contains three 
distinct genes/gene families, all of which are expressed 
specifically in the testis. Because Xmr is a synonymous 
gene with Six, 29 ' 30 we call this gene/gene family either 
Xmr or Xmr/Slx in this study. Xmr/Slx is known to be 
expressed in spermatids, where it encodes a protein, 
SLX/XMR, normally localized in cytoplasm. 29,30 Xmr/ 
Six represents a locally duplicated multigene family, 
whose copy number is at least 28 in LoD 4. Gmclll 
and LOC236749 are included in the same LoD, and 
both are expressed in the testis and in purified male 
germ cells (Fig. 6A; data not shown). 

The LoD 4 region represents one of the largest seg- 
mentally duplicated regions on the mouse X chromo- 
some (Katsura and Satta, personal communication) 
and can be divided into four subregions (Supple- 
mentary Fig.S4).Subregion lspans~3 Mband harbours 
tandemly repeated Xmr genes. Subregion II spans 
~3.8Mb and comprises both tandem and inverted 
repeats (IRs) of Xmr genes. Subregion III contains 
tandem and IRs of Gmclll genes, which are duplicated 
on two distant sites on the X chromosome; the other 



Table 1 . List of LoDs on the mouse X chromosome 



LoD 


Position 


Length 


Segment 


Methylation 




Gene families 


Description 


Number of duplications 


Human 


no. 




(bp) 


number on 
HELP array 


Brain M- 

value/testis 

M-value 


CGI Seg 
Dups 






Total 


Within 
LoD 


Outside 
LoD on 
chrX 


CTA 
genes 


1 


chrX:3035387- 
4561 626 


1 526 239 


1 2 


-2.52 


1 344 


Gmcll I 


Germ cell-less protein-like 1 


1 1 


8 


3 




2 


chrX:7857031- 
792441 0 


67 379 


5 


-1.924 


88 


Ssx9 


Synovial sarcoma, X breakpoint 9 


1 1 


4 


7 


CTA 


3 


chrX:81 1 6622- 
8236784 


1 20 1 62 


1 9 


-1 .82 


45 


FthM 7 


Ferritin, heavy polypeptide-like 1 7 


7 


6 


1 


CTA 


4 


chrX:22991291- 
321 1 7922 


9 1 26 631 


100 


-1.4745 


10 697 


Xmr 
Gmcll 1 


XMR protein (Xlr-related, meiosis regulated) 
Germ cell-less homologue 1 (Drosophila)- 
like 

Hypothetical protein LOC236749 


32 
1 1 


28 
2 


4 

9 
















LOC236749 


1 


1 


0 




5 


chrX:50380769- 
521 84422 


1 803 653 


48 


-1.75 


1 1971 


Similar to Xmr 
protein 


Adult male testis cDNA, RIKEN full-length 
enriched library, clone: 4930527E24 
product: weakly similar to XMR PROTEIN 


30 


1 6 


14 




6 


chrX:57970833- 
58090999 


1 20 1 66 


5 


-2.14 


23 


Ldod 


Leucine zipper, down-regulated in cancer 1 


1 


1 


0 




7 


chrX:58775958- 
5881 51 72 


39 214 


1 3 


-3.45 


1 0 


a 1 70001 9B21 Rik 


Mus musculus adult male testis cDNA, M. 
musculus RIKEN cDNA 1 70001 9B21 gene 
(1 70001 9B21 Rik), transcript variant 1 , 
non-coding RNA 


1 


1 


0 




8 


chrX:72448732- 
72544743 


96 01 1 


4 


-1.59 


67 


LOC238829 


Hypothetical protein LOC238829 
(AK1 33378 — M. musculus adult male testis 
cDNA, RIKEN full-length enriched library, 
clone: 4933402E1 9 product: hypothetical 
protein, full insert sequence) 


A 

4 


l 


□ 

3 




9 


chrX:84663998- 
86056286 


1 392 288 


23 


-1.2207 


334 


Pet2 

4932429P05Rik 


Plasmacytoma expressed transcript 2 
M. musculus RIKEN cDNA 4932429P05 
gene (4932429P05Rik), mRNA. (M 
musculus adult male testis cDNA) SMEK 
homologue 3, putative 


1 
1 


1 
1 


0 
0 




1 0 


chrX:87564706- 
875791 03 


14 397 


20 


-3.70 


1 




(up site of Magebl , Mageb2) 










1 1 


chrX:87598220- 
88271 067 


672 847 


1 5 


-1.0483 


43 


Mageb5 
Magebl 


Melanoma antigen, family B, 5 
Melanoma antigen, family B, 1 


2 
2 


2 
1 


0 

1 


CTA 
CTA 


1 2 


chrX:88271 564- 


14 137 


21 


-3.76 


1 




(up site of Magebl ,Mageb2) 
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site is also classified as LoD 1 (Table 1 ). Subregion IV is 
<1 Mb and contains tandem repeats of Xmr genes. 
One hundred and forty-one CCGG segments are 
mapped within LoD 4, and the fold difference in methy- 
lation level (brain versus testis) of these segments were 
calculated as described in Fig. 4A. The mean value is 
-1.4745, suggesting that the CCGG segments in this 
region are generally hypomethylated in the testis 
genome (Table 1 and Figs 3 and 4). 

Because of the repetitive nature of the LoD region, 
HELP probes cannot be assigned for most of the subre- 
gions II and IV. To examine DNA methylation in these 
regions, we performed Southern analysis of the testis 
and brain DNA digested with either HpaW or Msp\, and 
hybridized with an Xmr cDNA probe. As shown in 
Fig. 6A and Supplementary Fig. S4, the Xmr cDNA 
probe should be able to assess the methylation status 
of 1 61 restriction fragments. These fragments are dis- 
tributed evenly within subregions I, II and IV, and fill 
the gaps of information provided by the nanoHELP 
assay, which tests only unique sequences. The results 
of the Southern blot analysis shown in Fig. 6B demon- 
strate that the Xmr region is highly methylated in the 
brain and liver, whereas a considerable proportion of 
the restriction fragments appear unmethylated in the 
testis. 

It has been suggested that transcriptionally active 
genes are hypomethylated in their promoter region, 
while their gene bodies tend to be hypermethylated. 6 
However, a magnified view of the LoD 4 region 
(Supplementary Fig. S5) indicates that all CCGG seg- 
ments in this region are hypomethylated in the testis 
and male PGCs regardless of their positions with 
respect to the Xmr genes. Both the probes positioned 
near the transcription start sites and the probes posi- 
tioned at introns or even at intergenic regions are 
unmethylated in the testis, GS and male PGCs. The 
Southern blot analysis data suggest that the CCGG seg- 
ments containing exons of the Xmr genes appear to be 
relatively hypomethylated in the testis (Fig. 6B). These 
results imply that methylation of the whole LoD 4 is 
subjected to a region-wide regulation. This feature is 
shared by other LoDs not described here. 

Mageb belongs to the Mage (melanoma antigen) 
gene family, which is expressed in spermatogenic cells 
and in some cancer cells. 3 1 Figure 6C shows a genomic 
region spanning ~1 Mb that contains Magebl and 
Mageb2 genes. This region represents a large IR with 
arms of ~400 kb in length. At the ends of both arms, 
LoDs 1 0 and 1 2 are located 4-2 kb upstream of the 
transcription start sites of Magebl and Mageb2, re- 
spectively. These two LoDs do not contain the Mageb 
locus itself (Supplementary Fig. S6). Both LoDs are 
highly homologous and ~14kb long, and comprise 
repeat sequences with a unit size of ~3 kb. These 
sequences are both tandem and IRs (Fig. 6C; magnified 
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Figure 5. Inverse relationship between DNA methylation and expression of genes within the LoDs. (A) A plot of the expression levels of genes 
contained in LoDs, and regions outside LoDson the mouse X chromosome. Gene expression data were obtained from the Affymetrix Exon 
array data set. 43 The expression va lues of exons contained in LoDs were averaged and plotted, and the data points from regions outside LoDs 
were similarly averaged. For statistical ana lysis of the data from regions outside LoDs, the same number of data points as those used to analyse 
within LoDs were randomly selected and used. (B) Mean methylation levels of genomic DNAwithin LoDs. The mea nM-va lues from the CCGG 
segments contained in LoDs or in regions outside LoDs are shown. Statistical significance was tested by Wilcoxon f-test, and P-values are 
shown within the figures. 



part), are found only in these LoD regions and are 
clearly hypomethylated only in germ cells (Supplemen- 
tary Fig. S6). Hypomethylation of LoDs 1 0 and 1 2 was 
confirmed by Southern blot analysis as described 
(Fig. 4Cand D). 

3. 1 0. Developmental changes in the methylation levels 
of LoDs 

LoDs are hypomethylated in the testis, GS cells and 
male PGCs. The methylation heat maps of LoDs 1 0 
and 1 2 shown in Fig. 6D illustrate howthe DNA methy- 
lation of LoDs changes during germ cell development. 
During development, the PGC genome undergoes 
global DNA demethylation, which is known to be com- 
pleted between E1 1 .5 and E1 3.5. 2 In E1 0.5 PGCs, the 
LoDs tested here are not unmethylated completely, 
whereas demethylation of LoD DNA progresses in 
PGCs by E1 3.5. At E1 7.5, the LoD regions are largely 
unmethylated in both male and female PGCs. This 
trend persists in later stages of male germ cells, 
whereas the methylation levels of the LoDs appear to 
increase in newborn oocytes. The results together 
with the results shown in Supplementary Fig. S3 
suggest that, in general, LoDs begin to form between 
E10.5 and E1 3.5, and distinct hypomethylated 
domains are established around E1 3.5 in the male 



germline. Despite the global increase in DNA methyla- 
tion at later stages of male germline development 
(Fig. 2B), hypomethylation of LoD DNAs is maintained 
in male germ cells. Although LoDs 10 and 12 are 
shared by male and female PGCs, the overall DNA 
methylation patterns are not identical, suggesting 
that a distinct epigenomic status is generated in male 
and female germlines (Supplementary Fig. S3). 

3.11. Coincidence of most LoDs with broad domains of the 
repressive histone mark, H3K9 dimethylation 

We have shown that most LoDs are broad genomic 
domains with low DNA methylation levels that form 
boundaries between the LoDs and other methylated 
parts of the genome. The mammalian genome can be 
divided into broad domains of distinct histone modifi- 
cations. 36,37 Forexample, LOCKs (large organized chro- 
matin K9 modifications) are genomic domains with 
histone H3 lysine 9 dimethylation (H3K9me2) modi- 
fication thought to be involved in region-wide gene re- 
pression. 37 To investigate the relationship between 
LoDs and the repressive histone mark, we performed 
ChlP-on-chip analysis to detect H3K9me2 enrichment 
in GS cells as a representative of germ cells in this test 
and in cumulus (somatic cells in the ovary) cells as a 
somatic cell control. 26 Figure 7 shows the H3K9me2 
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Figure 6. Genomic structures of LoDs:Xmr and Mageb. (A) Genomic structure of LoD 4 (Xmr). The top section represents a similarity dot plot 44 of 
LoD 4 and flanking regions (chrX: 21 000 000-32 940 000; UCSC mm8). Similarities of the sequences are colour-coded as shown by the 
colour bar on the right (high similarity in dark red, 1 00% similarity; low similarity in blue, 70% similarity). The horizontal lines represent 
direct repeats, and the vertical lines indicate IRs. The middle section depicts the locations of genes contained in LoD 4 (coloured rectangle). 
Exon array expression data for the brain and testis 43 are shown at the bottom (high gene expression in red and low expression in green). 
The positions of sequences with homology to the XmrcDNA probe are also shown by grey vertical bars (Southern probe). (B) Southern blot 
probed with the Xmr cDNA. Genomic DNAs from three organs were digested with either Mspl or Hpall. Xmr cDNA probe (708 bp) was 
amplified from testis cDNA using primers, Xmr FW: 5'-AAGGGTGCAGTTGTGAAGGT-3', Xmr Rv: 5 ' -TGTTGGTCTCCATGTTCATCA- 3 ' . The 
hybridization signals in the unresolved part of the testis DNA blot are likely to reflect non-specific cross-hybridization. Southern blot 
analysis data using DNA doubly digested by BamVW plus either Hpoll or A4spl confirmed this notion (data not shown). (C) Genomic 
structure of LoDs 1 0, 1 1 and 1 2 {Mageb 1 /b2). Top: similarity dot plot. The vertical lines and coloured arrows indicate positions of IRs. The 
colouring of the similarities is the same as found in Fig. 6A. Homologous repeats are represented by the same colour. A magnified view of 
the IRs contained in LoDs 1 0 and 1 2 is presented on the left. Gene expression data for the brain and testis are shown at the bottom. The 
positions of sequences with homology to a probe used for Southern blot analysis (Fig. 4C and 4D) are shown by grey bars (Southern 
probe). (D) A DNA methylation heat map of LoDs 1 0 and 1 2. DNA methylation levels of the CCGG segments are represented as a heat map 
(unmethylated segments in dark blue,A4-value = 6.00; highly methylated segments in dark red,A/l-value = -6.00). 



modification patterns on the X chromosome in bothGS 
and cumulus cells. The overall pattern of H3K9me2 
modificationsalongtheXchromosome in GS cells ises- 
sentiallysimilartothat in cumulus cells (Fig. 7Aand C). 
Enrichment of the modifications along the LoD regions 
(coloured light blue) is seen in both GS and cumulus 
cells (Fig. 7A and C). In contrast, as expected, DNA 
methylation levels in the LoD regions are high in 
cumulus cells and low in GS cells (Fig. 7B and D). 
Figure 7 E shows a magnified view of LoD 1 2, indicating 
that the hypomethylated region has the H3K9me2 



mark. A significant enrichment of H3K9me2 is found 
in most (11 of 1 6) LoDs in GS cells (Supplementary 
Fig.S7). 

Expression of six genes contained in the LoDs was 
examined in cumulus, GS, testis and two other 
somatic cell types by quantitative reverse transcription 
polymerase chain reaction (RT-PCR) analysis (Fig. 7F). 
Fthl17, Ott, Mageb and Magea are included in the 
LoDs and are expressed in the testis; the expression 
level of these genes is much higher in GS cells, but 
only negligible expression is detected in somatic cells. 
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This result indicatesthat genes in hypomethylated LoDs 
can be expressed even though the same region has con- 
tinuous H3K9me2 modifications (Fig. 7A and C and 
Supplementary Fig. S7), demonstrating peculiar epige- 
nomic features of LoD regions. It is reasonable to 
expect that Ssx and Xmr are barely detectable in GS 
cells, because these genes become active in post- 
meiotic stages, 16 whereas GS cells are derived from 
pre-meiotic spermatogonia. It is probable that, in GS 
cells, other factors required for the expression of post- 
meiotic genes (e.g. transcription factors) are lacking. 
These results suggest that DNA hypomethylation in 
LoDs may not be sufficient by itself, but is a prerequisite 
for the expression of LoD genes. 

4. Discussion 

In the present study, we have analysed the DNA 
methylation profiles of developing germ cells using 
the modified nanoHELP method, which requires only 
a limited amount of DNA. Recent studies by Guibert 
etal. 9 usingthe methylated DNAimmunoprecipitation 
analysis of a promoter array and Seisenberger et al. 1 0 
using a whole-genome bisulphite sequencing suggest 
that DNAdemethylation of the PGC genome is initiated 
earlier than previously thought. 1 ,2 Our finding that the 
PGC genome issubstantially hypomethylated already at 
E10.5 is consistent with the result of Seisenberger 
et al.,^° confirming the technical reliability of our 
method. Our data from developing germ cells revealed 
for the first time the presence of large, hypomethylated 
DNA domains on the X chromosome of male germline 
cells in mice. 

4. 1 . Discovery of large hypomethylated domains 
ofepigenomic organization 
Traditionally, epigenetic studies have focused on 
modifications of genes or elements adjacent to genes. 
However, with the development of genome-wide 
assays, recent studies have revealed marked clustering 
of particular histone modifications over relatively 
large genomic regions; e.g. LOCKs and BLOCs (broad 
local enrichments) enriched with the histone marks 
H3K9me2 and H3K27me3, respectively. 36,37 These 
large epigenetic marks, LOCKs in particular, are 
thought to be involved in gene silencing. DNA methyla- 
tion is found throughout the mammalian genome 
except for short unmethylated regions, CGIs, which typ- 
ically occur around the transcription start sites of 
genes. 6 The LoDs described in this work are also hypo- 
methylated genomic regions, but are distinct from 
CGIs in terms of their size, tissue specificity and 
genomic structure. To our knowledge, large differen- 
tially methylated DNA regions showing germ cell speci- 
ficities,such as LoDs, have not been previously reported. 



This may be because previous studies have focused only 
on methylation of gene promoters and not broader 
genomic contexts in germ cell samples. In contrast, 
our custom HELP chip method could assess the DNA 
methylation status of both genie and intergenic 
regions using the meager amounts of DNA that could 
be sampled from germ cell genomes in this study. 

Seisenberger et al. 1 0 recently reported the results of 
whole-genome bisulphite sequencing analysis of the 
mouse PGC genome. We analysed their data on E1 6.5 
male PGCs to determine whether LoDs could be 
found at the single-nucleotide level and found that 
the number of sequence reads mapped to LoDs were 
significantly lower than that mapped to the flanking 
regions (data not shown). Given that mapping of bisul- 
phite-converted short sequence reads onto locally 
duplicated regions is technically challenging, the prob- 
ability of finding LoDs using the bisulphite sequencing 
data currently available seems low. In contrast, the 
HELP assay uses an Msp\ control to remove the poten- 
tially confounding effect of copy number variation 12 
alongwith a probe designthatselectsuniquesequences 
for hybridization. These ensure that the DNA methyla- 
tion readout from regions of segmental duplication is 
genuinely reflective of the underlying DNA methyla- 
tion. Oda et al. 38 reported that CGI methylation of 
an X-linked homeobox gene cluster spanning ~1 Mb 
is under long-range regulation in a tissue-specific 
manner. Therefore, widespread changes in DNA methy- 
lation could occur depending on the cellular phenotype 
or differentiation status. 



4.2. Peculiar epigenomic features of LoDs 

LoDs have been detected based on arbitrary criteria, 
but most share common features. Most LoDs represent 
segmental duplications that harbour germline- 
expressed genes and overlap with large H3K9me2- 
enriched domains. Wen et al. 37 described large 
H3K9me2-enriched chromatin blocks, LOCKs, in the 
human and mouse. Theoccurrence of LOCKs isdifferen- 
tiation specific: there are more LOCKs in differentiated 
cells, and genes contained in the LOCKs tend to be 
repressed during differentiation. Because LOCKs sub- 
stantially overlap with lamin B-associated domains, a 
gene-silencing mechanism based on three-dimension- 
al subnuclear organization has been proposed. 37 We 
found that most LoDs are enriched with H3K9me2 
modifications, and that at least four LoDs— 1 , 2, 3 and 
4— correspond to the LOCKs described by Wen et al. 37 
(data not shown). Overlaps of other LoDs with LOCKs 
cannot be checked because LOCKs data are not avail- 
able for the rest of the mouse X chromosome. Overlap 
of LoDs with LOCKs is counterintuitive because LOCKs 
are supposed to repress gene expression, whereas 
genes can be highly expressed within LoDs. This may 
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Figure 7. La rge stretches of H 3 1<9 dimethyl modifications overlapping LoDs. (A) ChlP-on-chip data for the H3K9me2 modification along the X 
chromosome in GS cells. The green dots represent individual probe data. The positions of LoDs are shown in light blue. (B) DNAmethylation 
profile of GS cells within the X chromosome. The blue line denotes the average M-va lues. The positions of LoDs are shown in light blue. (C) 
ChlP-on-chip data for H3K9me2 modification along the X chromosome in cumulus cells. (D) The DNAmethylation profile of cumulus cells 
within the X chromosome. (E) A magnified view of the LoD 1 2 region. Top, the H3K9me2 modification inGScells(blue)and in cumulus cells 
(pink). Middle, DNAmethylation data for E1 0.5 male PGCs, E1 3.5 male PGCs, E1 7.5 male PGCs, P0.5 spermatogonia, GS cells, testis, brain, 
thymus and cumulus cells. Bottom, positions of LoD and Mageb 1 jb2 genes. (F) Quantitative reverse transcription polymerase chain reaction 
(RT-PCR) expression analysis of genes in LoD regions. TheSsx9 (Ssx), Fthll 7 (nh\), Xmr, Mageb 1 , Ott and Magea8 genes are located within 
LoDs on theXchromosome. Beta-actin (ACTB) and Gapdh (GAPD) genes were also examined. The expression level in the testis is set at 1 .0,and 
the expression levels in other eel Is and tissues (brain,cumuluscells,GScells and liver) relative to that in the testis are shown. Primers used for 
this RT-PCR analysis are listed in Supplementary Table S4A. 



be reconciled if we assume that gene silencing in LoDs is 
complete when both DNA methylation and H3K9me2 
marks are established, but is derepressed in the absence 
of DNA methylation. Consistent with this idea, somatic 
cells such as cumulus cells, which have both marks, do 
not express the LoD genes, although germ cell genes 
can be active in DNA-hypomethylated but H3K9- 
dimethylated LoDs. The H3K9me2 histone methyl- 
transferases, G9a and GLP, are required for DNA 
methylation in ES cells, but not in cancer cells. 39 It is 
thus likely that DNA methylation and the H3K9me2 
modification are not always interdependent, and that 
they can be regulated independently in the LoD 
regions of male germ cells and cancer cells. 



4.3. Segmental duplication, hypomethylation and gene 
expression in germ cells and cancer cells 
Through the analysis of germ cell-specific hypo- 
methylated regions, we found that LoDs overlap with 
large segmentally duplicated regions, within which 
germ cell-expressed genes are commonly found. Some 
of these genes, such as Xmr, are found only in rodents. 
In contrast, the Mage gene family genes, Ssx and 
Fthll 7, are conserved in the human genome and are 
known as CTA genes, which are expressed specifically 
in germ cells and in some tumour cell types. More 
than 260 CTA genes have been detected in the 
human (http://www.cta.lncc.br/), and half of them 
are on the X chromosome. Most of the X-linked CTA 
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genes are organized as multicopy gene families. 18 
Warburton et al. 40 searched the IR structures in the 
human genome and found that the X chromosome is 
replete with large IRs harbouring testis genes, most of 
which encode CTA genes. More than 40% of large IRs 
found in the mouse genome are on theXchromosome, 
and Ssx, Fthll 7 and the Xmr loci are contained in such 
regions. Th us, th ree ki nds of stud ies with d ifferent sta rt- 
ing points reached the same conclusion: theXchromo- 
some is abundant with duplicated regions containing 
germ cell-expressed genes, including CTA genes. To 
this, we add the new observation that these regions 
also have unique epigenomic features, i.e. widespread 
DNA hypomethylation and H3K9me2 enrichment. 
The epigenomic features of the LoDs could account 
forthefindingthatCTAgenescan be activated by inhib- 
ition of DNA methylation but not by a reduction in 
H3K9 dimethylation, 39,4 ' and suggest that DNA 
methylation is the key epigenetic mechanism involved 
with the regulation of LoD-CTA genes. It is not fully 
understood how DNA methylation regulates the coor- 
dinated expression of CTA genes in a cell type-specific 
manner. It is also necessary to clarify whether CTA 
gene expression contributes directly to oncogenesis or 
whether it simply reflects global chromatin changes 
that occur during tumour formation. Simpson et al. 42 
postulated an intriguing hypothesis that the aberrant 
expression of germline genes in cancer reflectsthe acti- 
vation of the gametogenic programme, which is nor- 
mally silenced in somatic cells. The gametogenic 
programme is normally repressed because germline- 
specific products would be harmful for normal 
somatic cells, whereas they would be advantageous 
for cancer cells. To test this hypothesis, it will be essen- 
tial to elucidate the activation mechanism for the germ- 
line gene expression programme, as well as the 
epigenetic and chromatin status required for the oper- 
ation of this programme. As shown in this study, wide- 
spread DNA hypomethylation may be a prerequisite 
for the activation of LoD genes, including CTA genes. 
In addition to DNA methylation, the nuclearchromatin 
environments within germ cells and/or tumour cells 
may also be important for long-range transcriptional 
control over large genomic regions, because LOCKs, 37 
LoDs and the partially methylated domains found in 
colorectal cancer 7 are correlated with nuclear lamina- 
associated domains. Therefore, further stud ies of epige- 
nomic features and the nuclear architecture of LoDs 
may shed light on the germline gene expression pro- 
gramme and its relationship to oncogenesis. 42 
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